Sentiment Analysis in a Resource Scarce Language:Hindi

نویسنده

  • Vandana Jha
چکیده

A common human behavior is to take other’s opinion before taking any decision. With the tremendous availability of documents which express opinions on different issues, the challenge arises to analyze it and produce useful knowledge from it. Many works in the area of Sentiment Analysis is available for English language. From last few years, opinion-rich resources are booming in other languages and hence there is a need to perform Sentiment Analysis in those languages. In this paper, a Sentiment Analysis in Hindi Language (SAHL) is proposed for reviews in movie domain. It performs 1) preprocessing like stopword removal and stemming on the input data, 2) subjectivity analysis on the preprocessed data, to remove objective sentences that are not contributing to opinion of the input data, 3) document level opinion mining for classification of the documents as positive and negative using two different methods: Machine learning technique and Lexicon based classification technique. We have used Naive Bayes Classifier, Support Vector Machine and Maximum Entropy techniques for Machine learning. In Lexicon based classification, adjectives are considered as opinion words and according to the polarity of the adjectives, the documents are classified, 4) negation handling with window size consideration for improving the accuracy of classification. The effectiveness of the proposed approach is confirmed by extensive simulations performed on a large movie dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Supervised Method for Constructing Sentiment Lexicon in Persian Language

Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...

متن کامل

Sentiment Classification in Resource-Scarce Languages by using Label Propagation

With the advent of consumer generated media (e.g., Amazon reviews, Twitter, etc.), sentiment classification becomes a heated topic. Previous work heavily relies on a large amount of linguistic resources, which are difficult to obtain in resource-scarce languages. To overcome this problem, we investigate the usefulness of label propagation, which is a graph-based semi-supervised learning method....

متن کامل

Cross Lingual Sentiment Analysis using Modified BRAE

Cross-Lingual Learning provides a mechanism to adapt NLP tools available for label rich languages to achieve similar tasks for label-scarce languages. An efficient cross-lingual tool significantly reduces the cost and effort required to manually annotate data. In this paper, we use the Recursive Autoencoder architecture to develop a Cross Lingual Sentiment Analysis (CLSA) tool using sentence al...

متن کامل

Sentiment analysis methods in Sentiment analysis methods in Persian text: A survey

With the explosive growth of social media such as Twitter, reviews on e-commerce website, and comments on news websites, individuals and organizations are increasingly using opinions in these media for their decision making. Sentiment analysis is one of the techniques used to analyze userschr('39') opinions in recent years. Persian language has specific features and thereby requires unique meth...

متن کامل

Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification

The sentiment classification performance relies on high-quality sentiment resources. However, these resources are imbalanced in different languages. Cross-language sentiment classification (CLSC) can leverage the rich resources in one language (source language) for sentiment classification in a resource-scarce language (target language). Bilingual embeddings could eliminate the semantic gap bet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016